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GENERAL  INTRODUCTION 


The  present  final  report  chronicles  various  aspects  of  the 
research  conducted  by  the  principal  investigator  related  to  the 
wavefront  deconvolution  problem.  The  material  is  summarized  in 
the  form  of  three  self-contained  sections,  each  devoted  to  a 
single  item.  The  three  sections  are: 

Section  1:  A  numerically  stable  iterative  method  for 
the  inversion  of  wavefront  aberrations  from  measured 
point  spread  function  data. 

Section  2:  Optimum  balanced  wavefront  aberrations 
for  radially  symmetric  amplitude  distributions; 
generalizations  of  Zernike  polynomials. 

Section  3:  Application  of  filtered  singular  value 
decomposition  to  wavefront  deconvolution. 


SECTION  1 

A  NUMERICALLY  STABLE  ITERATIVE  METHOD  FOR  THE 
INVERSION  OF  WAVEFRONT  ABERRATIONS  FROM 
MEASURED  POINT  SPREAD  FUNCTION  DATA 

ABSTRACT 

This  paper  outlines  a  method  for  the  determination  of  the 
unknown  wavefront  aberration  function  of  an  optical  system  from 
noisy  measurements  of  the  corresponding  point  spread  function. 
The  problem  is  cast  as  a  nonlinear  least  squares  estimation 
problem  for  the  values  of  the  wavefront  aberration  function  at 
N  points  over  the  slit  aperture,  from  measurements  of  the  point 
spread  function  at  M  points  with  M  >  N.  Newton's  method  is  used 
to  replace  the  nonlinear  minimization  problem  with  a  sequence 
of  linear  problems.  Each  such  problem  requires  the  inversion 
of  the  Hessian  matrix  of  the  error  metric  which  is  shown  to  be 
both  singular  (with  rank  <  N-l)  and  ill-conditioned.  To 
overcome  singularity,  the  pseudoinverse  is  used;  to  overcome 
ill-conditioning  the  pseudoinverse  is  calculated  using  singular 
value  decomposition  and  the  singular  values  then  filtered. 
Attention  is  drawn  to  difficulties  such  as  nonuniqueness, 
sensitivity  of  algorithms  to  initial  guess,  etc.,  the 
ancillary  mathematical  details  being  set  out  in  appendices. 

Some  illustrative  numerical  results  are  presented  and  analyzed. 


1.  INTRODUCTION 


There  are  a  number  of  situations  of  current  interest  where 
one  is  required  to  determine  the  wavefront  aberration  function  of 
the  optical  system  from  measurements  of  the  corresponding  point 
spread  function  (e.g.,  image  forming  adaptive  optical  systems, 
laser  beam  forming,  etc.).  Published  papers  specifically  devoted 
to  this  problem  are:  Gonsalves  [1]  and  Southwell  [2].  The 
general  problem  of  phase  retrieval  from  modulus  data,  of  which 
this  is  a  typical  situation,  has  been  attacked  by  a  variety  of 
methods  almost  too  diverse  to  catalog  and  we  refer  the  reader  to 
the  vast  literature  for  details.  Some  typical  references  are 
[3-10]. 

Our  approach  to  the  problem  rests  upon  two  provisos: 

1.  The  aberrated  wavefront  itself  is  the  primary  artifact  of 
the  inversion,  not  an  assumed  functional  form  of  it.  Curve  fitting 

the  reconstructed  wavefront  can  be  done  after  the  inversion,  if 
desired. 


2.  The  inversion  of  the  wavefront  from  the  measured  point 
spread  function  involves  the  solution  of  a  nonlinear  integral 
equation  of  the  first  kind,  Eq.  2.2,  with  the  attendant  numerical 
instability  as  befits  an  ill-posed  problem  [11,12].  The  nonlinear 
inversion  method  is  therefore  tailored  to  be  robust  with  respect 


1-1 


to  noise  in  the  measured  point  spread  function. 

Given  these  two  provisos,  we  have  chosen  to  cast  the 
problem  of  determining  the  wavefront  as  a  nonlinear  least  squares 
estimation  problem  and  brought  to  bear  the  powerful  tools  of 
modern  numerical  analysis  towards  a  solution. 


The  plan  of  the  paper  is  as  follows.  Section  2  is  devoted 
to  the  necessary  preliminary  material.  In  section  3  is  discussed 
the  strategy  of  the  nonlinear  least  squares,  while  section  4 
contains  the  tactics  (filtered  singular  value  decomposition,  the 
scaling  conditions,  stopping  rule,  etc.)  required  for  performing 
the  inversion.  The  ancillary  mathematical  details  are  relegated  to 
a  series  of  Appendices.  Finally  some  numerical  results  are 
discussed  in  section  5. 

Although  our  analysis  is  couched  in  the  specifics  of  wavefront 
aberration  and  point  spread  function,  the  formalism  is  independent 
of  the  specific  situation  and  is  applicable  to  the  phase  retrieval 
problem  in  general,  for  phases  having  compact  support. 
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2.  PRELIMINARIES 


The  measured  point  spread  function  will  be  termed  xm  where 
m  =  1,2,...,  M  are  the  indexed  values.  It  will  be  convenient  to 
write 

l 

Ti  ! 

T, 


T  = 


m 


We  take  as  the  diffraction  model 

+  1 

In  I 

t(v)  = 


1 
2  1 


eiw(p)  eivp  dp 


-1 


(2.1) 


(2.2) 


where  w(p)  =  (27r/A)W(p)  is  the  wavefront  aberration  function  measured 
in  wavenumber  units  (27r/A).  Amplitude  variations  over  the  exit 
pupil  are  not  allowed  in  this  version.  Although  this  can  be  in¬ 
cluded  In  the  analysis  we  prefer  to  omit  it  in  order  to  focus  on 
what  we  believe  to  be  the  more  important  issues. 


The  diffraction  model  is  determined  by  N  free  parameters,  namely 
the  values  of  the  wavefront  aberration  function  W(p)  at  N  points 
over  the  slit  aperture.  The  points  p^  p2,...,  p^  need  not  be 
equally  spaced.  We  choose  to  make  them  equally  spaced  and  to  let 
N  be  an  odd  integer  so  as  to  Include  the  point  p=0.  The  aberration 
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function  must,  by  definition,  satisfy  the  requirement 


W(0)  =  0  (2.3) 

In  order  to  carry  out  this  program,  we  discretize  the  integral 

N 


t  (v  )  =  t  = 
m  m 


1 

2 


iv  p  iw 
a  e  nr  n  e  n 
n 


(2.-4) 


n=l 


Here  p^  are  the  quadrature  points  and  an  the  corresponding  weight 
factors.  Our  calculations  were  performed  using  a  traperzoidal 
rule.  We. have  also  set  wn  =  w(pn). 


It  is  again  convenient  to  write  the  N  w^  parameters  as  a 
column  vector  of  length  N 


W  = 


wn 


w. 


(2.5) 


w 


N 


In  this  condensed  notation,  Eq .  2.4  becomes 


t1(W),  t2(W),  ...,  tM(w) 


(2.6) 


indicating  that  each  value  of  t  is  related  to  all  the  w  .  In 

m  n 

A 

the  direct  problem,  we  are  given  W  and  are  required  to  calculate 
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The  inverse  (or  inversion)  problem  relates  the  unknown  wave- 
front  values  to  the  known  (measured )point  spread  function  data 
t 2 •  •  •  >  t jyj  via  the  nonlinear  functional  relation 


t  =  t  (W),  m  =  1, 
mm’ 


M 


(2.7) 


This  can  be  set  into  the  more  succinct  form 


t  =  t (W) 


(2.8) 


upon  defining  the  additional  vector 


t  (W) 


t1  (W) 
t2(W) 


(2.9) 


tH(W) 


Equation  2.8  is  to  be  interpreted  as  a  nonlinear  system  of  M 
equations  in  N  unknowns.  Enough  data  t  must  be  given  to  allow 
smoothing  of  the  experimental  errors  in  the  diffraction  model. 
Consequently,  M  >  N  (or  at  the  least  M  >  N)  so  the  system  in 
Eq .  2.8  is  formally  overdetermined. 
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3.  NONLINEAR  LEAST  SQUARES:  STRATEGY 


We  will  attempt  to  "solve"  the  inverse  problem  by  requiring 

A  A  A 

that  t(W)  matches  t  in  the  •  ■  that 

e(w)  =  y ;tm  -  t^iw)]2  (3.D 

:n=l 

be  "  inimir.eJ  when  considered  as  a  function  of  W.  in  other  words, 
we  approach  the  inversion  as  an  unconstrained,  nonlinear  least  squares 
problem.  (The  fact  that  w,.,  .  Wo  =  0,  Is  m  f  a  real  constraint  since 

{  iM-  l )  /  c 

it  amounts  to  translating  the  entire  W  vector  by  a  constant  and  it  is 

shown  in  Appendix  that  t(W)  is  invariant  under  such  a  translation). 
Upon  defining  the  vector 

$(W)  =  t(W)  -  t  (3-2) 


it  is  a  simple  matter  to  rewrite  E  as 

E(W)  =  $+ (W)0(W) 


(3-3) 


Iterative  methods  will  be  employed  in  that  we  will  replace 
the  nonlinear  problem  of  M  equations  in  N  unknowns  by  a  sequence 
of  linear  least  squares  problems. 


The  Taylor  series  of  the  objective  function  E(W)  can  be  used 

to  approximate  the  minimum  value  of  E  from  points  W  near  to  the 

minimum  W  .  by  setting  W  ,  =  W  +  W.  Consequently 

min  min 


E(W  .  )  m  E ( W ) 
'  mi  n 


3E(W)  A ... 

9w  ‘  "”n 
n 


E  E 

j  =  i 


L-  1 


(  3  ■  '4  ) 
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I',  fai-  tl  i  •  .  ...  .  ...  J  \  .  ...  A  .J.  . 


upon  neglecting  cubic  and  higher  powers  of  AW.  The  vector  AW  is 
as  yet  unknown  and  our  object  is  to  determine  this  vector  (of 
parameter  corrections)  which  will  approximate  the  minimum  of  E  from 

A 

the  point  W. 


As  before,  it  will  be  convenient  to  work  in  matrix  notation. 

A 

Let  us  define  g,  the  Jacobian  gradient  vector,  of  E  as 

J-  IS- I  0.5) 


A 

and  H,  the  Hessian  matrix  of  E,  as 


3w^3w2 


3w,  3w.t 
1  N 


3w23w1 


3w23^N 


(3.6) 


3w.,3w  3w..3w0 

N  1  N  2 


Note  that  H  is  a  symmetric  NxN  matrix.  A  further  matrix  which 
we  will  utilize  is  the  Jacobian  matrix 


1  >  * 


G  = 


30] 

3w, 


3  <p  ■ 

t 

3w. 


3  4> 
3w- 


M 


3<l>] 

3w. 


3  <j> , 
_ £ 

3w, 


3  <p 


M 


3w, 


3<J>- 


3w 


N 


3  <}) , 


3w 


N 


3<j> 


M 


3w 


N 


(3.7) 


This  matrix  is  generally  not  square  because  M  >_  N.  Given  these 
matrices,  we  can  rewrite  Eq.  3.4  in  matrix  form  as 


A  i  A  A 


E(Wmln)  ^  E(W)  +  S+AW  +  |AW+HAW 


(3-8) 


The  elements  of  g  and  H  can  be  expressed  directly  in  terms 


of  <6  .  Now 
m 


3E 


'M 


3w 


=  2 


n 


3  $ 
m  3w 


m 


m=l 


n 


(3-9) 


which  can  be  cast  into  matrix  form  as 


A  A^,  A 

e  =  2G  <p 


(3-10) 


The  corresponding  elements  of  the  NxN  Hessian  matrix  of  E  are 


2 

3  E 


3Wj  3w^ 


+ 


32<J> 

m 

3w .  3w 
J 


l 


(3-11) 


with  j,  l  =  1 ,  2,  ...  N.  In  some  versions  of  least  squares,  the 

A  A  |  A 

second  term  is  neglected  so  that  H  can  be  approximated  by  2G  G. 
However,  we  will  not  make  this  approximation.  See  Appendix  A  for 
the  explicit  expressions  in  terms  of  our  diffraction  model,  Eq. 


2.-4. 


A 

To  determine  that  value  of  AW  which  makes  E  stationary,  we 
equate  to  zero  the  gradient  of  E  keeping  g  and  H  fixed.  The  result 
is 

HAW  +  g  =  0  (3.12) 


or 

A  A  /N  ,  Av 

HAW  =  -2G  9  (3-13) 

A  A  A  A  A 

where  H,  G,  and  <j>  are  evaluated  at  W.  The  solution  AW  of  this 
system  of  linear  equations  gives  the  fundamental  second  order 
increment  towards  the  minimum  of  E. 


Equation  3.13  was  established  from  a  linearization  of  the 
basic  system,  Eq.  2.8.  The  nonlinear  least  square  solution  will 
(hopefully!)  be  reached  after  a  sequence  of  iterations.  After  each 
iteration,  Eq.  3.11)  is  used  to  obtain  a  new  Hes  ;ian  of  partial 
derivatives  and  the  process  is  repeated  until  the  error  metric  E 
stabilizes  (i.e.,  until  no  further  diffraction  model  parameter 
improvements  can  be  usefully  made).  Although  this 
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procedure  looks  straightforward,  it  abounds  with  basic  computa¬ 
tional  pitfalls  which  we  discuss  in  the  next  section. 


4.  NONLINEAR  LEAST  SQUARES:  TACTICS 

The  previous  section  was  devoted  to  strategy,  the  present 
section  with  the  tactics  of  solving  Eqs .  3.13  and  2.8.  The  basic 
difficulty  in  the  calculations  is  the  fact  that  the  Hessian  matrix 

A 

H,  although  N  x  N,  is  at  most  of  rank  (N-l).  See  Appendix  B  for 

^-1 

details.  Consequently,  the  formal  inverse  H  does  not  exist  and 
more  sophisticated  procedures  must  be  employed  in  order  to  "invert" 
Eq.  3-13*  Furthermore  H  is  ill  conditioned  (see  Eq.  4.4). 

We  have  chosen  to  employ  the  method  of  singular  value  decom¬ 
position  (with  an  important  modification)  to  evaluate  the  psuedo- 
inverse  of  H.  Singular  value  decomposition  has  among  its  several 
virtues  the  ability  to  determine  the  rank  of  H  during  the  computa¬ 
tion.  For  those  readers  not  familiar  with  singular  value  decom¬ 
position,  we  have  outlined  a  version  appropriate  to  the  N  *  N 
Hessian  in  Appendix  C. 


The  solution  to  Eq. 

AW  =  - 


3.13  is  given  by  Eq.  B.8 


L  —A -  v 

S,=  l  i 


k  <  N 


(4.1) 


/A  y\  V} 

where  u^  and  v^  are  the  l  column  vectors  of  U  and  V,  respectively. 

/A 

Equation  4.1  shows  that  the  solution  AW  is  a  linear 

combination  of  k  matrices,  v^,  each  of  rank  one  since  u+G+$  is  a 

scalar. 
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The  ill-posed  nature  of  the  inversion  is  directly  evident  in 
Eq .  4.1.  The  smaller  singular  values  entering  into  the  denominator 
of  the  terms  of  the  expansion  tend  to  magnify  greatly  any  error 

A 

in  the  data  vector  <p  resulting  in  a  spurious  solution.  To  correct 
this  state  of  affairs,  the  expansion  is  terminated  in  a  rational 
fashion  b-efore  the  contamination  due  to  the  numerically  small  singu¬ 
lar  values  sets  in. 


Our  choice  to  accomplish  this  is  to  use  a  filtering  procedure. 
We  rewrite  Eq.  4 . 1  in  the  form 

AW  -  -  £  f(a£)  (u£G%)v£  (4.2) 


where  f(a£)  is  a  "filter  function"  depending  on  the  singular 
values  a£.  f(a£)  is  required  to  act  like  1/a £  for  large  a£, 

approach  0  for  very  small  a£;  and,  finally,  to  decrease  from 
l/a£  to  zero  smoothly  in  the  intermediate  range.  A  useful  candi¬ 
date,  evidently  first  used  by  Crone  [23], is 


f(0t) 


L+l  .  L+l 
0 1  +  * 


(4.3) 


where  L  is  some  non-negative  integer  and  q  is  a  non-negative  real 
parameter. 
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Another  way  to  view  the  desirability  of  using  a  filter 

A 

function  is  to  note  that  the  components  of  AW  that  lie  in  the 
direction  corresponding  to  large  singular  values  are  the  important 
ones  insofar  as  reducing  E.  It  is  the  remaining  components  of 
AW  that  cause  numerical  instability  because  they  lie  in  directions 
that  allow  very  large  changes  with  little  effect  on  the  actual 
approximation.  The  filter  function  given  in  Eq .  4.3  attenuates 

A 

these  parasitic  components  of  AW.  Thus,  q  plays  the  role  of  a 
variable  metric  and  is  to  be  chosen  reasonably  large,  q=0(.l), 
when  far  from  the  least  squares  solution  and  decreased  as  the 
iterations  are  sequenced. 

Given  an  initial  estimate  (really  a  guess!)  of  W,  say  Wv 

we  first  perform  a  singular  value  decomposition  of  the  Hessian 
~  ~(1 ) 

matrix  H  evaluated  at  Wv  .  We  solve  for  the  search  direction 

A 

AW  using  Eq.  3.13  employing  a  predetermined  filter  function  and  q 

and  search  along  this  direction  a  distance  s^  for  the  minimum 
*  (  o ) 

W  .  Unlike  the  usual  steepest  descent  method  which  forces  one 

to  search  for  a  minimum  in  the  direction  of  the  negative  gradient 

of  E,  the  present  method  modifies  this  direction  by  the 

peuedoinverse  of  the  Hessian,  which  contains  gradient  and  slope 

'■(2) 

information.  The  new  estimate  is  then  iterated  to  obtain 

r  o  \ 

W  decreasing  q  as  |)AW  j’  decreases. 

/\ 

Due  to  changes  in  H,  the  filter  must  be  modified  at  each 
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A 


stage  in  actual  calculations.  The  Hessian  H  is  almost  always 
ill  conditioned  in  the  sense  that 


°max  ^  amin 


>>  1 


(4.4) 


where  a  and  a  J  are  the  largest  and  smallest  positive 
max  min 

singular  values. 


The  first  consideration  is  scaling,  cr  „  varies  with  M,  N  and 

UlciX 

k.  Since  ill  conditioning  is  defined  relative  to  a  ,  the 

filter  function  f (a^)  defined  in  Eq.  4.3  must  also  be  defined 

relative  to  a  by  appropriate  scaling.  We  set  L  =  2  in  Eq.  4.3 
max 

in  all  the  calculations  reported  in  this  paper.  Furthermore, 
fCa^)  is  scaled  thusly 


f(ot) 


_ 1_ 

a 

max 


(4.5) 


where  p.  =  (o„  /  o  ).  The  scaled  filter  is  made  to  depend  upon 
a  %  max 

the  V.th  iteration  by  choosing  q  in  Eq.  4.3  to  depend  on  k(i.e., 

q  -  V- 


A  reasonable  definition  of  qk  must  depend  upon  the  following 
two  requirements: 


1.  As  converges  to  the  minimum  t  we  want  the  filtered 

inverse  Hessian  to  converge  to  the  true  inverse  Hessian  (i.e., 

_  i 

f(a^)  -*•  )  .  We  monitor  such  convergence  by  noting  the  size 
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f  ir  ) 

of  the  relative  change  y^  in  W  defined  by 

II  vw^ll 


(4.6) 


From  theory  [14,15],  the  convergence  near  the  minimum  should  be 
quadratic,  so  the  filter  convergence  is  made  superlinear  to  take 
some  advantage  of  this  fact. 


2.  As  shown  in  Appendix  D,  a  minimizer  VTU'  exits  in  the 

region  { W :  -ir  <  w^  <  ir}.  Consequently  it  is  desirable  to  keep 
/v  ( k ) 

AW  of  this  order  of  magnitude  to  ensure  iterates  stay  in  this 
region.  So  if  y^  is  large,  we  choose  the  q^  to  pass  only  the 
larger  singular  values. 


With  these  considerations  in  mind,  q  is  chosen  to  be 

Iv 


qk  =  min 


j.l,  30(Yk)1,5  | 


(4.7) 


These  parameters  were  found  by  trial  and  error. 


A  deeper  understanding  of  the  effects  and  advantages  of 

filtering  comes  from  the  observation  that  filtering  is  equivalent 

to  Tikhonov  regularization  [ll,12].  If  no  filter  is  used  then  by 

~(k) 

statements  A  and  B  of  Appendix  C^aW  will  minimize 
|  H(k)  AW(k)  +  g^ll  .  If  a  filter  is  used,  then  there  exists  a 
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scalar  X^_  and  matrix  A^  such  that  AW^"1  will  minimize 

Hh00Aw(*)  +  g^)||  +  Ak|U<*)A5<lc)|j 


(4.8) 


i.e.,  AW  is  a  regularized  least  squares  solution.  According 
to  the  Tikhonov  approach  the  regularized  solution  that  minimizes 
Eq.  4.8  will  be  "smoother"  than  the  least  squares  solution. 


In  the  method  used  here  although  each  linear  problem  is 

regularized  by  filtering,  as  k  increases  x  +  0  so  the 

k 

nonlinear  problem  is  not  regularized,  i.e.,  we  are  minimizing 

A 

iteratively  the  nonlinear  function  E(W) ,  not  the  regularized 

A  A 

function  E(W)  +  X  ||w||  .  Viewing  the  filter  as  a  regularizer 
shows  us  clearly  that  filtering  places  emphasis  on  those 

A 

directions  in  which  E(W)  is  fastest  in  decreasing,  and  neglects 

/\ 

directions  in  which  E(W)  is  slowly  decreasing,  thereby  trading 
these  off  for  a  AW  of  small  -norm.  Thus  W*^  stays  as  close  as 
possible  to  Wk  1,  a  good  policy  if  W^1^  is  a  good  initial  guess. 


Newton's  method  performs  well  as  long  as  care  is  taken  in 

,  4  o) 

inverting  the  Hessian  and  a  "good"  initial  guess  for  W  is 
available.  There  are  bounds  on  an  initial  guess,  expressed  in 

A 

terms  of  the  higher  derivatives  of  E(W) ,  which  guarantee  the 
convergence  of  Newton's  method  from  that  guess;  this  is  the 


essence  of  the  Newton-Kantorovich  theorem  [25, 16].  Such  bounds  are 


generally  very  conservative  and  also  hard  to  evaluate.  However 
due  to  the  simple  analytical  form  of  E(W)  the  calculations  can  be 
done  although  we  have  as  yet  not  performed  this  task.  There  is 
an  algorithm  due  to  Kung  [2  7]  which  makes  use  of  these  bounds 
to  generate  a  succession  of  Newton  iterates  that  will  converge 
from  any  initial  guess. 

There  are  available  other  methods  which  produce  a  sequence 

A 

that  decreases  E(W)  until  a  sufficiently  good  starting  point  for 
Newton's  method  is  reached  (e.g.,  the  simplex  method  of  Nelder 
and  Mead  [2S]  as  described  in  Daniels  [23]).  However  the 
difficulty  is  to  determine  when  such  a  point  is  reached  without 
calculating  the  bounds  mentioned  above. 

To  circumvent  this  aspect  of  the  problem  and  avoid  using 
another  minimizing  algorithm,  we  applied  the  following 
procedure : 


1.  Run  Newton's  method  for  N  very  small  from  an  arbitrary 


initial  guess  to  achieve  a  minimum  W. 


(0) 

N 


W 


(0) 

N 


-(1) 

2.  Increase  N  to  N' ,  produce  W^f  by  interpolation  over 


-(1) 

3.  Run  Newton's  method  with  W^,  as  an  initial  guess  until 


,(0) 


it  converges  on  W^,  .  Return  to  2. 
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The  algorithm  is  repeated  until  a  sufficiently  large  N  is 
reached.  In  practice  it  was  found  that  N  =  5  was  a  reasonable 
starting  N  value.  However  N'  could  not  be  increased  markedly 
over  N.  Limits  on  available  computation  time  did  not  allow  us 
to  experiment  to  determine  optimal  procedures. 

The  final  "tactical"  problem  to  be  discussed  is  the  stopping 

~(k) 

rule  (i.e.,  the  criterion  used  to  decide  when  W  is  sufficiently 
close  to  W'  so  that  the  iterative  algorithm  can  be  halted) .  The 
stopping  rule  was  made  independent  of  E(W)  for  two  reasons.  The 
first  is  that  the  shape  and  magnitude  of  the  surface  defined  by 

A  A 

E(W)  depends  on  N,M  and  the  measurement  noise  in  t,  so  that 

A 

criteria  using  E(W)  would  be  too  problem  dependent  to  allow 
comparison  of  results. 

The  second  reason  is  that  E(W)  can  be  insensitive  to  large 

A  A 

changes  in  W  (this  is  indicated  by  the  ill  conditioning  of  H) . 

Thus  a  stopping  rule  based  on  changes  in  E(W)  only,  can  halt  the 
iterative  algorithm  far  from  a  minimum. 

~  (k) 

For  these  reasons,  a  relative  error  based  on  W  was 
first  considered  manely:  halt  Iteration  algorithm  when 


y,  <  e 
'  k 


(4.9) 


where  y^  is  given  by  Eq.  4.6  and  e  is  a  specified  constant.  Trial 
calculations  showed  that  the  iterative  algorithm  often  took  a 
small  step  followed  by  a  large  step  (typical  behavior  of  a 
minimization  algorithm  descending  a  long*  curving  valley  with 
steep  walls,  e.g.,  Rosenbrook  function  [20]).  Thus  to  avoid 
stopping  after  a  small  step  while  still  far  from  the  minimum, 
the  stopping  rule  finally  adopted  was:  halt  iteration  algorithm 
when 


Yk+i  +  £ 


(4.10) 


5. 


SOME  NUMERICAL  RESULTS 


To  test  the  numerical  workings  of  the  entire  algorithm,  we 
considered  a  known  wavefront  aberration  function  W(p)  and  from  it 
calculated  the  diffraction  model  point  spread  function  t(v  )  using 
Eq .  2.4.  Noise  was  introduced  into  the  measured  point  spread 

A 

function  t  in  a  multiplicative  fashion 


noisy 


=  ( 1  +  <Sp  )  t 


(5.1) 


where  6  is  a  positive  constant  less  than  unity  and  p  is  a  random 
variable  uniformly  distributed  over  (-1,  +1) 


f(y)  =|  | w |  <  l 

=  o  lul  >  1  .  (5.2) 

Values  of  6  used  in  the  present  calculations  are  6  =  0.025  and 
6  =  .05  described  loosely  as  5$  and  10$  noise. 


The  sampled  values  v  were  taken  to  be  v  =  rmr/2  in  accord- 
r  m  m 

ance  with  the  sampling  expansion  [22]  appropriate  to  slit  aper¬ 
ture.  Furthermore,  the  number  M  of  sampled  values  was  taken  to 

A 

be  odd  in  order  to  take  the  maximum  value  of  x  as  m  =  0.  All 
calculations  reported  are  for  N  =  21  and  M  =  21  or  31. 


The  wavefront  aberration  function  was  taken  to  consist  of 
coma  and  spherical  aberration 
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w(p)  s  ~  w(p) 

=  2tt[W3S3(p)  +  W4S4(p)] 

=  2n[W3(p3  -  jr-  p)  +  (p1*  -  j  p2)J  (5-3) 

where  W  and  W4  are  measured  in  wavelength  units  (i.e.,  W3/X, 

W^/X  are  dimensionless).  S3(p)  for  coma  and  S4(p)  for  spherical 
aberration  are  the  slit  aperture  versions  [22]  of  the  Zernike 
polynomials.  The  numerical  calculations  were  carried  out  for 


The  true  wavefront  aberration  function  is  shown  as  a  solid  line 
in  the  succeeding  figures. 

Our  problem  is  to  determine  the  wavefront  aberration  function 
from  the  noisy  sampled  point  spread  function,  Eq .  5.1,  and  compare 
it  with  the  true  wavefront,  Eq.  5-3- 

In  the  first  set  of  calculations,  the  noise  level  was  set  at 
5%  and  the  number  of  sampling  points  M  was  taken  to  be  31.  Follow¬ 
ing  the  procedure  described  in  the  previous  section,  an  initial 
guess  was  iterated  until  a  satisfactory  E  was  achieved  as  per 
the  stopping  algorithm  discussed  in  the  previous  section  with 
e  =  .005.  Approximately  twenty  iteratives  were  performed  to 
achieve  these  levels  of  E.  The  results  of  three  typical  sample 
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realization  reconstructions  of  w(p)  are  shown  in  Fins.  1-3  along 
with  the  corresponding  values  of  E.  The  reconstructed  values  are 
in  excellent  agreement  with  the  true  values  and  do  not  require 
any  detailed  comment. 

In  order  to  test  the  stability  of  the  algorithm  with  respect 
to  the  number  of  sampling  points,  we  next  set  f*=21  (so  that  the  number 
of  sampling  points  equals  the  number  of  reconstructed  wavefront 
points)  and  kept  the  same  noise  level  of  5% .  Two  sample  realiza¬ 
tion  reconstructions  are  displayed  in  Figs.  4,5.  Overall  these 
results  are  of  about  the  same  accuracy  as  those  with  more  sampling 
points . 

Finally,  we  ran  calculations  for  10%  noise  with  31  sampling 
points.  Two  reconstructions  are  shown  in  Figs.  6  and  7.  The 
results  in  Fig.  6  are  extremely  good,  even  those  in  Fig.  7  are 
respectable . 

It  is  of  some  interest  to  list  the  final  iterate  singular 
values  corresponding  to  Figs.  1-3,  see  Table  1.  As  discussed  in 
Appendix  B,  the  Hessian  is  singular  which  is  reflected  in  the 
fact  that  o  =  0.  The  first  few  ordered  singular  values  are 
roughly  equal  for  the  three  cases  in  question;  however,  the  higher 
order  a  are  very  small  and  highly  irregular  in  their  behavior  as 
can  be  seen  by  comparison  of  the  first  and  third  columns.  The 


0. 


other  cases  behave  in  much  the  same  manner,  always  with  a  = 
Calculations  were  also  performed  on  two  special  forms  of 

W(p)  : 

a.  even  parity,  W(p)  =  W(-p),  (e.g.,  spherical  aberration) 

b.  odd  parity,  W(p)  =  W(-p),  (e.g.,  coma). 

Observed  behavior  led  to  the  establishment  of  the  following 
results  (the  proofs  are  omitted  for  brevity). 

1.  If  is  odd,  then  all  subsequent  are  odd  for 

l  >  k . 

/\  (  If  }  A 

2.  If  Wv  is  even  and  t  is  even,  then  all  subsequent 
W  ^  ^  are  odd  for  Si  >  k. 

3.  If  W(o)  minimize  E(W)  and  is  even,  then  -W^°^ 

A 

also  minimizes  E(W). 

/s(k)  *  ^  (k) 

An  odd  or  even  W  does  not  imply  that  the  Hessian  H[W  ] 

is  degenerate,  rather  that  the  spaces  of  odd  vectors  and  even 

^  *  ( k ) 

vectors  are  eigenspaces  of  H.  If,  for  example,  at  any  stage  W 

is  odd  then  condition  2  implies  that  Newton's  method  is  hence¬ 
forth  restricted  to  a  (N-l)/2  dimensional  subspace  of  possible 

A 

solutions  which  may  not  contain  the  true  minimum  of  E(W),  even 

/\  A 

though' the  range  of  H(W)  may  be  larger  than  this  subspace.  Con¬ 
dition  3  raises  the  point  that  although  singular  value  decomposition 
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B 


and  filtering  ensure  that  AW  is  uniquely  defined  for  each  sub¬ 
problem,  the  problem  as  a  whole  can  have  several  solutions,  each 
of  which  is  a  potential  point  of  convergence  for  Newton's  method. 

Rapid  convergence  from  any  initial  state  was  observed  for 
t's  calculated  from  even  or  odd  W(p);  especially  so  if  the  initial 
guess  has  the  same  parity.  In  some  of  these  cases  the  rank  of 
H  reduced  to  (N-l)/2. 
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APPENDIX  A 


The  matrices  g,  G  and  H  defined  in  Section  3  are  given  in 
terms  of  derivatives  of  E  via  Eqs.  3-9  and  3-H-  The  derivatives 
of  E,  in  turn,  require  a  knowledge  of  <J>(W)and  its  first  two  deriva¬ 
tives  . 


The  explicit  expression  for  <J>(W),  as  defined  in  Eq.  3-2,  for 


our  model  is 


p 

^  i  N  iv  p  iw 

<0  (W)  =  ±  l  a  e  m  ne  n  -  x 

m  2  „  £ ,  n  m 


'  f{[  l  %  cos(vmpn  +  “n>]2 
n=l 


+  tl  Vln<vmpn  +  V]  } 

n=l 


-  x  ( A .  1 ) 
m 


The  first  derivative  becomes 


—  =  4  cos( v  p,  +  w,  )  y  a  sin(v  p  +  w  ! 
^  2  k  m  k  k  n  nr  n  n 


2  “kSln(vmPk  +  V  l,  “n  cos<vmpn  +  »n> 

n=i 


(A. 2) 


where  k=l,  N.  The  second  partial  derviatives  of  4>m  are 
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- —  =  i  akalcos[vm(pk-pl}  +  (wk~wi)]  >  k/l 

3w,  3w7 

k  1  (A.  3) 


-  4  l'  akancos^vm<pk-Pn  >  +  ("k-Kn)]'  k =l 


where  the  prime  on  the  summation  sign  implies  that  the  term  n=k 
is  to  be  omitted. 


1 
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APPENDIX  B 


A  major  factor  in  the  difficulty  of  numerically  finding  a 
minimizer  Wv  '(say)  of  E(W)  is  that  many  such  minimizers  exist. 
Consequently  each  is  a  potential  point  of  attraction  for  the 
algorithm.  In  this  appendix  the  existence  of  a  one-dimensional 
subspace  of  minimizers  is  demonstrated;  furthermore  this  ensures 

/V  A 

that  the  rank  of  the  Hessian  H(W)  is  always  less  than  or  equal  to 
(N-l) . 


It  is  convenient  to  define  the  quantities 


c  =  a  cos(v  p  +  w  ) 
m,n  n  rrfn  n 


s  =  a  sin(v  p  +  w  ) 
m,n  n  m  n  n 


c  =y  c 

m  m,n 

n-l 


(B.l) 


S  =S  s 
m  Z-i  m,n 

n-l 


From  these  definitions  and  the  relevant  expressions  in  Appendix 
A,  it  follows  that 

<f>  (W)  =  tt  (C2  +  S2)  -  T 
m  4  m  m  m 


( B .  2 ) 
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(B.  3) 


3<t>  (W) 
m 

aw 


r  (S  c  -  C  s  ) 
2  m  m,n  m  m,n 


The  Hessian  H  can  be  written  as 
M 

H  =  2Y  /  +  0  B(m)| 

np  Z,  I  np  m  np  | 

m=l 


(B.4) 


where 


.  .  30  34- 

.(m)  _  _ m 

np  “  3W  3W 
n  p 

=  ^(S 


2C 

-  S  C  s  ) 

c 

m  m,n 

m  m  m,n 

m,p 

2s 

-  S  C  c  ) 

s 

m  m,n 

m  m  m,n 

m,p 

(B.  5) 


and 


,  v  a24  (W) 

,(m)  =  m 

np  "  3W  3W 
n  p 

=  c  c  +  s  s  ) 
2  m,p  m,n  m,p  m,n 


,  n^p 


(B.6) 


=  -k(c2  +  s2  -  S  s  -  C  c  )  ,  n=p 

2  m,n  m,n  m  m,n  m  m,n 


Let  e  be  a  vector  whose  entries  are  all  unity.  We  now  show 

^  (  0  \  /N  A  /  o  \  ^ 

that  If  W  minimizes  E(W),  then  so  will  the  vector  Wv  +  ce 
for  any  real  c.  The  proof  will  follow  from  the  result  that 

^  A  A  A  /N  A 

4(W)  =  4(W  +  ce)  y  W 


(B.  7) 
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Furthermore  this  result  causes  the  Hessian  to  have  rank 


at  most  (N-l)  by  proving  that  e  is  in  the  null  space  of  H(W), 

i .  e  .  , 


H(W)e  =0  ,  V  W 

A  A 

This  is  equivalent  to  showing  that  the  determinant  of  H(W) 
vanishes.  The  formal  proof  follows  by  virtue  of  two  lemmas. 

Lemma  l 

0(W)  =  <J>(W  +  ce) 


This  follows  from 

A  A 

<t>  (w  +  ce)  = 
m 


ii 

n=l 


iv  p 
m  n 


iw 


ic 


ic 

e 


0  (W) 
m 


=  0  (W) 
m 


Lemma  2 

H(W)e  =  0 


1-29 


(B.8) 


(B.9) 


(B.10) 


(B.ll) 


For  the  proof,  it  suffices  to  show  that 


A<">;  =  5 


and 


Now 


B(m);  .  5 


N 


u'”>;>n  -  y  A<”> 

n  Z,  nP 
P*1 

N 


=  2  n  ~ 


S  C  S  )c 
m  m  ra,n  m,p 


N 


+y  4(C2s  -  C  S  c  )s 

pi 2  *»  m  m,n  m  m  m,n'  m,j 


1  2 

Tf( STriCTT1  „  ~  S  C  s  )C 
4  m  ra,n  m  m  m,n 


m 


1  2 

+r(C  sm  -  C  S  c  )S 
^  m  m,n  m  m  m,n 


m 


=  0 


(B . 12) 

(B.13) 


(B.14) 


L 


Also 


(£<■>>;> 


n 


I 


P=1 


B 


(m) 

np 


N 


c  c 
m,p  m,n 


P=i 


+ 


m,p  m,n 


) 


2(SmSm,n 
2(CmCm,n 
"  2(SmSm,n 


+  C  c 
m  m,n 


+  S  s 
m  m,n 


+  C  c 
m  m,n 


) 

) 

) 


=  0 


(B. 15) 


*( 0 ) 

In  the  program  a  particular  VT  '  of  this  subspace  is  chosen 

due  to  the  constraint  w(0)  =  0  or  in  discrete  form  w(n_i)/2  =  °* 

0 )  'r M 

Since  given  any  minimum  Vr  ,  a  W  which  satisfies  the  above 
constraint  can  be  constructed  by  choosing  a  particular  constant 


c  and  letting 


"CM  "CM 

VT  '  =  VT  ' 


+  ce 


(B.16) 
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<  :s 


then  the  constraint  w(0)  =  0  can  never  be  a  binding  constraint. 

*( 0 ) 

However,  choice  of  a  particular  Wv  does  not  remove  the 

difficulties  caused  by  the  existence  of  an  infinite  set  of  such 
(°) 


} 


APPENDIX  C 


The  Hessian  matrix  H,  Eq.  3.6,  can  be  expressed  as  the  pro¬ 
duct  of  three  matrices  (singular  value  decomposition  of  H) 


H  =  USV 

A  A  A  A  ^ 

where  U  and  V  are  NxN  orthogonal  matrices  (ie,  UU 

/V  A 

the  same  for  V)  and  S  is  an  NxN  diagonal  matrix. 

cr-, 


(C.l) 


.A  .  /V  A 

=  U  U  =  I, 


A 

s  = 


2 


( C  .  2 ) 


N 

The  a's  are  termed  the  singular  values  and  are  the  eigenvalues  of 

A  .  A  A  A 

H  H  s£  =  a£s£  l  =  1,2,  .  .  .  ,  N  (C .  3) 

This  is  the  mathematical  definition  of  the  singular  values,  but 
they  are  calculated  by  an  entirely  different  procedure  which 
guarantees  their  numerical  stability.  The  a's  can  be  numerically 
ordered 


a]_3.°2-  ’  ’  •— aN— ^ 


(C.i4) 

If  H  is  of  rank  k,  where  k<N,  then  the  last  N-k  of  the  a's  are  zero, 


\ 
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ft. 


The  solution  to  the  minimal  least  squares  problem  posed  in 
Eq.  3.7  can  be  cast  directly  into  a  form  involving  the  singular 
values  and  their  corresponding  singularvectors .  Substitution  of 


Eq.  C.l  into  Eq.  3.13  yields  after  some  matrix  manipulations 


/N  A  A/J\  A  1  A  1  A  A/\a  1  A 

AW  =  2(VSV)G%  =  -2HWG  4. 


(C.5) 


The  matrix  H®  5  VS®U+  is  termed  the  Moore-Penrose  psuedoinverse 


of  H.  Here 


with 


S^  5 


N 


(C.  6) 


a+  =  a  if  0  >0 
n  n  n 


(C.l) 


=  0  if  on=0 


It  is  not  our  intent  to  give  a  full  discussion  of  the 
Moore-Penrose  psuedoinverse  for  details  are  available  in  the 

A 

literature  [23-24].  Suffice  it  to  say  that  it  produces  a  AW  which 
satisfies  the  two  minimum  conditions  with  respect  to  our  linear 
problem: 
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A)  It  achieves  the  unique  minimum  of 
||  HAW  t  2G+i  ||. 

A 

B)  If  there  are  any  other  AW  which  satisfy  Eq .  3-13,  then 
Eq.  C.5  is  characterized  among  them  by  having  the 


smallest  norm;  in  other  words,  Eq.  C.5  minimizes  | j  A W 
among  the  solutions. 


The  solution  given  by  Eq.  C.5  becomes  somewhat  more  trans¬ 
parent  if  the  right  hand  side  of  Eq.  B.5  is  written  out  more 


explicitly 


k 

/\  ^  ^  U  y  G  (P  /v 

AW  =  ~22  ~b  vi> 

1=1  L 


( C .  8) 


where  u^  and  v^  are  the  l- th  column  vectors  of  U  and  V  respectively, 
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APPENDIX  D 


In  this  appendix  we  establish  bounds  on  the  minimum  VT  ' 
We  first  prove  that  E(W)  is  periodic  in  each  coordinate  with 
period  2n ,  i .  e  .  , 


E(W)  =  E(W  +  2ve  ) 


(D.l) 


where  e  is  the  Ith  unit  vector.  For  a  proof  it  suffices  to 

A  A 

show  the  result  for  the  components  of  <p(W) ,  thus 


<f>m(W  +  2Tre£)  =  ^  e 


iv  p  i(w  +2ire£) 
m  n  n  n 

e 


N  iv  p  iw  2 
=  £  e  m  n  e  n 

n=l 


=  0m(W) 


(D.2) 


This  property  is  dependent  on  the  particular  discretization 

chosen.  It  establishes  the  desirable  result  that  there  exists 
*  ( o ) 

a  minimum  Wv  '  of  E(W)  in  the  region 


S  =  {  W  I  -it  <  w  <  tt  } 
'  n  - 


Since  E(W)  has  period  2tt,  we  have 


(D.3) 


min  ^  min 
~  „  E(W)  =  ,  E(W) 


(D.4) 
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A  A 

E(W)  is  continuous  and  S  is  a  closed  set,  consequently  E(W)  attains 

~  ( o )  ''(’O') 

its  minimum  on  S  at  some  point  W  in  S .  By  periodicity,  W 

is  also  a  global  minimum. 

~  ( o ) 

Periodicity  provides  bounds  for  the  minimum  W  which  are 

c  o ) 

helpful  in  searches  for  W  ,  but  at  the  same  time  indicates 

A  A 

E(W)  is  a  complicated  surface  with  many  maxima,  minima,  and 
saddle  points,  obviously  a  surface  on  which  most  minimization 
algorithms  will  have  difficulty! 
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TABLE  1 . : 

FINAL  ITERATE 

TO  FIGS.  1-3 

SINGULAR  VALUES 

RESPECTIVELY 

a£  CORRESPONDING 

l 

1 

.1299778 

.1293188 

.1308360 

2 

.0654607 

.0663010 

.0651481 

3 

.0147847 

.0148309 

.0151498 

4 

.0069802 

.0059728 

.0061527 

5 

.0044707 

. 0040800 

.0040639 

6 

.0029514 

.0025031 

.0030560 

7 

.0022897 

.0014872 

.0021063 

8 

.0019339 

.0014096 

.0016379 

9 

.0014224 

.0010693 

.0014751 

10 

.0010348 

.0008257 

.0013507 

11 

.0009966 

.0007715 

.0012386 

12 

.0005798 

.0005468 

.0011485 

13 

.0003987 

.0005114 

.0010666 

14 

.0003521 

.0004359 

.0010383 

15 

.0002837 

.0003983 

.0009781 

16 

.0002275 

.0003603 

.0009411 

17 

.0001589 

.0003352 

.0008945 

18 

.0000700 

.0003189 

.0008881 

19 

.0000346 

.0002814 

.0005131 

20 

.0000170 

.0002568 

.0004882 

21 

0 

0 

0 
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FIGURE  LEGENDS 

Pig.  1  True  wavefront  (solid  line),  reconstructed  wavefront 
realization  (solid  circles):  E  =  .07841(1,  M  =  31, 

A 

5%  noise  in  t . 

Fig.  2  True  wavefront  (solid  line),  reconstructed  wavefront 
realization  (solid  circles):  E  =  .078154,  M  =  31, 

/v 

5%  noise  in  t. 

Fig.  3  True  wavefront  (solid  line),  reconstructed  wavefront 
realization  (solid  circles):  E  =  .089435,  M  =  31, 

5%  noise  in  t. 

Fig.  4  True  wavefront  (solid  line) >  reconstructed  wavefront 
realization  (solid  circles):  E  =  .075563,  M  =  21, 

A 

5%  noise  in  x. 

Fig.  5  True  wavefront  (solid  line),  reconstructed  wavefront 
realization  (solid  circles):  E  =  .327351,  M  =  21, 

5%  noise  in  t. 

Fig.  6  True  wavefront  (solid  line),  reconstructed  wavefront 
realization  (solid  circles):  E  =  .097244,  M  =  31, 

A 

10%  noise  in  x. 

Fig.  7  True  wavefront  (solid  line),  reconstructed  wavefront 

realization  (solid  circles):  E  =  .106523,  M  =  31, 

/\ 

10%  noise  in  x. 
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w(p) 


p 


True  Wavefront  (Solid  Line),  Reconstructed  Wavefron 
Realization  (Solid  Circles):  E  -  .078441,  M  -  31, 

5 7.  Noise  in  r 


V I  G~  I 
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w(p) 


p 


True  Wavefront  (Solid  Line).  Reconstructed  Wavefront  Realization 
(Solid  Circles):  E  -  .089435,  M  -  31,  57,  Voise  in  t  - 
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ci  rv  a 


•% 
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r  t  r_  s 


w(p) 


p 

True  Wavefront  (Solid  Line),  Reconstructed  Wavefront  Realization 
(Solid  Circles):  E  -  .106523,  M  -  31,  10%  Noise  in  t  . 
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SECTION  2 


OPTIMUM  BALANCED  WAVEFRONT  ABERRATIONS 
FOR  RADIALLY  SYMMETRIC  AMPLITUDE 
DISTRIBUTIONS:  GENERALIZATIONS  OF  ZERNIKE 

POLYNOMIALS 

ABSTRACT 

The  Zernike  aberration  theory  for  constant  amplitude 
circular  apertures  is  extended  to  annular  apertures  having  a 
Gaussian-like  radial  taper.  Explicit  expressions  are  obtained 
for  the  optimum  balanced  wavefront  aberrations  in  terms  of 
shifted  Jacobi  polynomials.  Properties  of  the  polynomials 
(e.g.,  Rodrigues  formula,  recurrence  relations,  derivatives, 
etc.)  are  investigated 


1. 


INTRODUCTION 


The  complex  diffracted  amplitude  in  the  receiving  plane, 
given  that  the  exit  pupil  is  circular,  is 


a(v,<}>)  = 


1 


2  TT 


Vo 


Afl  (  p,  9  )exp{ikW(  p  ,0  jxQ  ,yo)+ipvcos(6-<)>)}d0pdp  (1.1) 


where  VI(p}9|x  ,y  )  is  Hamilton's  mixed  characteristic  (wavefront 
aberration)  function  with  respect  to  the  object  plane  coordinates 
x0,  yQ  and  A  (p,8)  is  the  amplitude  distribution  over  the  exit 
pupil.  The  point  spread  function  t(v,4>)  is  given  by 


t(v,<t>)  = 


a  ( v ,  <j> )  1 2 
a  (  0 , 0 ) 


(1.2) 


so  that  0  £  t(v,<f>)  <  1, 


For  many  optical  systems,  A„(p,9)  is  constant  over  the 
aperture.  Without  loss  of  generality  we  set  Ao(p,0)  =  1;  such 
systems  are  termed  Airy  systems.  The  Zernike  polynomials  play  a 
fundamental  role  in  the  diffraction  theory  of  aberrations  of  Airy 
systems  [x-17l>  It  Is  also  possible  to  obtain  the  same  results 
by  direct  application  of  Marechal  aberration  balancing  theory 
[6,181  although  not  without  considerable  effort. 

An  extension  of  the  Zernike  type  theory  to  nonconstant  aper¬ 
ture  distributions  is  not  without  interest  especially  those  that 
are  radially  dependent,  i.e.,  A  (p,9)  =  A0(p).  We  consider  the 
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case  where  A  (p)  is  given  by 


A  (p)  =  (l-p2)a 


£  <  p  <  1 


0  <  P  <  £  , 


a  >  0  . 


(1.3) 


In  other  words  we  are  considering  an  annular  aperture  of  inner 
radius  e  with  a  Gaussian-like  amplitude  taper.  Note  that  for 
large  a,  we  have  (l-p2)a  c:  exp(-ctp2)  for  small  p.  The  situation 
stated  in  Eq.  1-3  is  precisely  the  one  encountered  in  active 
optics  using  a  segmented  annular  mirror  that  is  illuminated  by  a 
laser  beam.  Two  special  but  important  cases  are: 


A.  annular  aperture  A0(p)  =  1,  e<p<l 

=0,  0  ^  p  <  £ 

B.  Gaussian  aperture  A  (p)  =  (l-p2)a 


(1.4) 


(1.5) 


Generally  speaking,  small  to  moderate  amounts  of  wavefront 
aberration  take  energy  out  of  the  central  core  of  the  diffraction 
pattern  and  add  it  to  the  diffraction  rings.  Furthermore,  there 
is  very  little  change  in  the  gross  characteristic  width  of  the 
central  core  of  the  diffraction  pattern.  Apodization,  however. 


competes  with  aberration  effects  in  case  B  in  that  it  takes 
energy  out  of  the  diffraction  rings  and  adds  them  to  the  central 
core,  while  simultaneously  broadening  the  characteristic  width 
of  the  central  core.  Case  A  apodization  behaves  in  a  cooperative 
way  with  the  aberration  effects  by  adding  even  more  energy  tc  the 
diffraction  rings  while  decreasing  the  characteristic  width  of 
the  central  core.  The  general  case  is  intermediate.  These 
apodization  effects  must  manifest  themselves  in  the  determination 
of  the  optimum  balanced  wavefront  aberrations.  Obviously  the 
nonconstant  A  (p)  cases  lead  to  functions  that  differ  from  the 
Zernlke  polynomials  of  the  usual  Airy  system. 

The  purpose  of  the  present  paper  is  to  obtain  general 
explicit  expressions  for  the  optimum  balanced  wavefront  aberra¬ 
tions.  The  aberration  functions  corresponding  to  Eq.  1.3  are 
denoted  by  C™(p,e,a).  The  functions  corresponding  to  case  A  are 
denoted  by  A™(p,e),  those  of  case  B  by  B™(p,a).  When  e  =  0  and 
a  =  0,  these  functions  reduce  to  the  usual  radial  Zernike  poly¬ 
nomials  R™(p).  The  method  employed  in  this  paper  is  a  general¬ 
ization  of  the  elegant  (and  efficient)  procedure  developed  by 
Bhatia  and  Wolf  [4]  in  their  classic  paper  on  Zernike  poly¬ 
nomials.  Our  basic  concern  is  with  the  development  of  explicit 
expressions,  orthogonality  conditions,  recurrence  relations,  etc., 
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and  not  with  the  diffraction  imagery  associated  with  the  poly¬ 
nomials.  However,  Sec.  5  is  devoted  to  formulae  for  the  Strehl 

criterion.  Evaluation  of  the  Hankel  transforms  of  Cm,  Am,  and 

n*  n* 

8™  so  necessary  for  the  analytical  aspects  of  diffraction 
imagery  is  under  investigation. 


?-4 


■  5 


2. 


DERIVATION  OF  o'"  POLYNOMIALS 
n 

Four  conditions  [5]  are  imposed  on  the  radial  Zernike  poly¬ 
nomials  Rm ( p ) : 
n 

1.  R's  are  orthogonal  over  (0,1)  with  weight  factor 
unity,  i . e .  , 

■  * 

R„(P)  r™,(p)  pdp  =  ( 2n+2 ) - 1  6^,  .  (2.1) 

0 

2.  R™(p)  ls  a  Polynoinial  °f  degree  n  in  p  and  its 
lowest  term  is  of  degree  m  in  p. 

3.  R™(p)  ls  t0  be  even  or  odd,  the  parity  being  the 
same  as  that  of  n,  this  means  that  n-m  is  always 
an  even  integer. 

** .  R™(p)  ds  normalized,  R™(1)  =  1  for  all  n  and  m. 

Condition  M  is  a  corollary  to  Condition  1. 

We  require  that  C™(p)  satisfy  a  modified  version  of  these 
conditions.  Condition  1  now  becomes 

1 

C™(p)(l-p2)a  Cjj,(p)  pdp  =  hJJ(e,a)  5nn,  ,  (2.2) 

e 

where  the  constant  h™(e,a)  will  be  evaluated  shortly.  Condi¬ 
tions  2  and  3  are  unchanged,  while  Condition  4  is  modified 

:  slightly  to  read  C™(1)  h  l  for  all  n  and  m  and  for  0<e<l,  a>_0. 

[ 

I 

f 

t 
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Condition  2  is  the  crucial  one  in  that  the  Zernike  poly¬ 
nomials  R™,  although  orthogonal,  do  not  form  a  complete  set 
[21,22].  The  procedure  employed  by  Bhatia  and  Wolf  [4]  (and 
the  standard  procedure  in  such  cases  as  the  associated  Legendre 
polynomials),  is  to  factor  out  pm  so  that  the  remaining  polynomial 
is  of  degree  (n-m).  This  polynon.ial  is  orthogonal  with  respect 
to  the  nonnegative  weight  factor  pm.  Consequently,  by  standard 
theorems  in  the  theory  of  classical  orthogonal  polynomials  [22, 
23],  this  polynomial  set  is  complete.  Our  procedure  is  a  general¬ 
ization  of  this. 

We  factor  C™(p)  into  two  polynomials 

C™(p,e,a)  =  N™(e,a)  cm(p,e)  pn_m(p,e,a)  ,  (2.3) 

where  the  subscript  denotes  the  degree  of  the  polynomial.  N™ 
is  a  normalization  constant.  The  p-polynomials  will  now  form 
a  complete  orthogonal  set  with  respect  to  the  weight  factor 
[cm(p,e)]2  over  the  interval  (e,l),  provided  that  c  (p,e)  >  0 
over  the  same  interval.  In  fact,  we  will  set 


for  reasons  to  be  apparent  shortly.  To  determine  the  p-polynomial , 
we  employ  the  known  fact  [21,22]  that  if  the  weight  factor 
cm(p,e)  is  of  the  form  given  in  Eq.  2.4,  then  p^  m  must  be  a 
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scaled  version  of  the  Jacobi  polynomials  P^a’^(g)  as  defined 
in  [20,21].  In  point  of  fact 


pn-m(p’e,a)  =  P(nlm)/2<S)  ■ 


(2.5) 


where 


g  =  2 


(S?)- 


(2.6) 


It  remains  to  determine  t.he  normalization  constant  N  (e,a) 
The  Jacobi  polynomials  satisfy  the  condition 


■  (?) 


independent  of  0.  Since  cm(l,e,a)  =  1,  it  follows  that 


(2.7) 


[N^e.a)]"1 


n-m+a 

2 


(2.8) 


Note  that  if  a  =  0,  then  Nn  =  1  Independent  of  e. 

Putting  all  these  components  together,  we  have 


:”(p,c,a)  .  N”(e,o) 


as  the  sought-for  expression. 


,(a,m) 

(n-m)/2 


{£?)- 


(2.9) 


! 


1 


When  e  =  a  =  0,  we  reduce  to  the  usual  Zernlke  radial  poly¬ 
nomial 

C™(p,0,0)  =  rJJ(p)  =  pmP ( n— m ) /2  t2p2-l]  (2.10) 

as  first  noted  by  Bhatia  and  Wolf  [4].  Bear  in  mind  that  they 
used  the  old  G  notation  for  the  shifted  Jacobi  polynomials. 

We  wish  to  point  out  that  Tatian  [19 ]  had  previously  con¬ 
sidered  the  problem  of  optimum  balanced  aberrations  for  the 
annular  aperture,  however,  he  does  not  derive  any  explicit 
expressions.  Arimoto  [20]  considered  the  case  somewhat  analogous 
to  our  case  B. 
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3.  PROPERTIES  OF  Cp  POLYNOMIALS 


Since  C™  is  proportional  to  a  shifted  Jacobi  polynomial, 


we  can  use  its  properties  as  listed  in  [27,22]  to  derive  properties 


,m 


of  C  .  For  typographic  convenience,  we  omit  the  explicit 


dependence  on  C™  of  e  and  a  and  write  C™(p)  =  C™(p,e,a) 


,m 


A  very  useful  finite  series  representation  of  can  be 


obtained. from  Eq.  4.3-2  of  [27],  it  is 


/~»m  /  \ 

cn(P>  ■  Nn 


(tS' 


2  ^2\m/2  (n-m)/2 

l 

v=0 


/n-m+2a  \ 

,(n+m\, 

l  2  l 

A  2  /• 

v!  (v+a) 
2  -i  \ v  /  2  .  2\ (n-m-2v)/2 


n+m~2v 


-mm) 


(3-D 


Explicit  expressions  for  the  lower  order  C™(p)  are  given  in 


Table  1. 


A  Fodrigues  formula  for  C™ '  follows  from  Eq .  4.3-1  of  Szego 


by  appropriate  change  of  variable.  The  final  result  is 

m(  )(n-m)/2 


C>  =r- 


^(n-m) 


.  (i-es)  (n 


-",>/2(oJ-EOm/2(i-p!)‘I 


(i) 


(n-m)/2 


(  p 2_£  2 ) (n+m)/2  ^1_p2 ^n-m+2ot)/2 


(3-2) 


1 


2-9 


A  recurrence  relation  for  fixed  rr.  and  variable-  n  cati  also 
be  derived  from  the  basic  expression,  Eq  .  t .  h . 1 ,  in  Szero. 
Straight  forward  manipulations  yield 


^(n-m)  (r.+m+2a)  (  n+a-2)  C"‘(  p) 


( n+a )( n+a-1 )( n+a-2 )  r  +  (n+a-1 ) ( a2-m2 ) 


^  (n+a)  ( n-m+2a-2 )  (n+m-2  )  Cr‘  ^(p) 


(3.3) 


for  n-m>2  and  *  is  riven  i.  y 


C;  (p) 


The  initial  polynomials  are 


m+ : 


v  P . 


=  (1  +  ci 


,  -  i 


(l-£ 


2  \ 


-(m  +  2 


(p?-t2)' 


(m+2+a ) p 2  - ( l  +  a ) £ : 


(3.M 


When  n  =  n  -  0,  Eq .  3-3  reduces  to  the  recurrence  relation  for 
Zernike  polynomials  given  in  Myrick  [  0  ]  and  in  Kintner  [12]. 

Noll  [it]  has  pointed  out  the  usefulness  of  the  derivative 
of  the  Zernike  polynomials  for  certain  applications.  Both  he 
and  Kintner  [ 1 ■ .  ]  have  developed  such  recurrence  relations.  We 
can  obtain  one  such  relation  for  C™  in  the  following  fashion . 
Differentiate  both  sides  of  Eq .  2.8  with  r.'Spe.M  to  p,  the 
result  is 


r1-!  n 


T-  Cm(p) 

dp 


=  mp( p2-e2 ) 


+  N 


m 


P  2-£  2 

il-e2  ) 


c”(p) 


m/2 


4p 


(  1  -  £  2  ) 


iL  p(a,m)  ,  v 
dg  r(n-m)/2 


(3.5) 


The  derivative  (with  respect  to  g)  on  the  right-hand  side  can 
be  expressed  in  terms  of  the  function  itself  by  using  Eq .  4.5-7 
of  [?;].  Upon  combining  all  terms 


(n+a ) p  1 (p2-e2 ) (1-p2 )  j^C™(p) 

=  jm(n+a)  (1-p2 )  -  |  (n-m)[(n+a)  g  +  m  -a]j  C™(p) 

,  /n-2  +  a\/h  +  a\~ 1 

+  j  (n+m)  (n-m+2a)  I  jy  J  Cn_2(p)  '  (3-6) 

This  formula  expresses  the  derivative  directly  in  terms  of  two 
polynomials  of  adjacent  degree  n  and  fixed  order  m.  When  e=a=0, 
this  reduces  to  an  expression  given  in  [153- 

We  now  proceed  to  the  evaluation  of  the  coefficient  h™  (e,a 
associated  with  the  orthogonality  requirement,  Eq .  2.2.  Upon 
substituting  Eq .  2.9  into  Eq.  2.2  and  transforming  to  the  variab 
g  defined  in  Eq .  2.6,  we  obtain 
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,  m ,  , 

n  (  z  ,  a , 

n  5 


i:.:  )■(-.•)  " 


2 


m-t-m-t-d 


+  1 

( v’+S)m(  l-g)a 


-1 


,  (  a ,  m )  ,  „  , 
(n-m)/2  r  ^ 


dg  (3.7) 


The  integral  has  been  evaluated  in  Szego,  Eq .  4.3*3.  The  final 
result  is 


n: 


m 


'~T(n+a+Ty 


,  2\  a+1  n+m+? 

1-e  j  r  - 


n-:r:+2  +  2a 


„  n+m+2+a  r 

n+m+2+2a 

i  2 

2 

(3.3) 


When  e  =  a  =  0,  this  reduces  to  the  usual  result 


nj|(  0  5  0)  =  (  2n+l ) -1 


(3.9) 


STREHL  CRITERION 


Since  the  aberration  polynomials  C™  are  already  optimum 
balanced,  the  maximum  intensity  (Strehl  criterion)  of  the  point 
spread  function  is  at  v  =  0  irrespective  of  <p .  Thus,  we  have 


t(0,<J>)  = 


a  +  1 


2u ( 1-e 2 ) 


1  r  2tt 


a 


(l-p2)aexp{ikW(p,0) }depdp 


(4.1) 


Provided  that  W(p,9)  is  small,  we  can  expand  the  exponential  and 
retain  only  the  first  three  terms 


exp {ikW}  :  1  +  ik  W  -  ^  k2W2  + 


The  term  linear  in  W  will  vanish  upon  integration  leaving 

1 1  r2* 


(4.2) 


t(  0, 4>) 


1  - 


k2 ( q+1 ) 

4tt(  1-e  2  )a+^  J&  J 


~  1  - 


k2(q+l) 

a+1 


r2ir 


(l-P2)aW2(p,0)d0pdp 


( 1-p2 )aW2 ( p, 0)d0pdp 


2 it ( 1-e2 )  Je  '0 

The  expansion  of  W  at  a  fixed  object  point  xQ,y0  is 


(*.3) 


OO  00 


W(p,9|x,y)  =  l  l  [c  (x  ,y  )cosm9 
n=0  m=0 


+  smn(xo’yo)sinm6]Cn(p) 


(4.4) 


where  e  (x  ,v  )  and  s  (x  ,y  )  depend  on  the  object  point.  The 
mn  o ’ 1  o  mn  o  ’ J  o 

restrictions  on  n  and  m  are  that  n  >  m  and  (n-m)  is  an  even  inte¬ 
ger.  In  the  special  case  of  rotational  symmetry,  s  =  0. 


The  series  in  Eq .  4.4  is  now  substituted  into  the  integral. 
The  orthogonality  relations  for  the  trignometric  functions  and 
for  the  C™  polynomials  allow  us  to  obtain 


t(  0,<t>) 


1  - 


k2  (ot+1 ) 

/  ,  2\“+- 
TT ( 1-e 2 ) 


,  2 
'nr 


s 2  )  hni  (  e  ,  a ) 
nm  n 


(4.5) 


as  the  final  expression  for  the  Strehl  criterion.  The  prime  on 
the  summation  sign  indicates  that  the  terms  for  which  m  =  0  are 
provided  with  a  factor  of  one-half.  When  e  =  a  =  0,  this  reduces 
to 


t(0,<j>) 


s 2  ) 

nm 


( 2n+2 ) “ 1 


(4.6) 


i 
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c°o  =  1  ! 

c{  =  (l-e2)~h(p2-e2)k 

1  j 

C  °  =  (l+a)-1(l-e2)-1[(2+a)p2-(l+a)e2-l] 

C2  =  ( 1-e 2 )“ 1 ( p2-e2 ) 

C  3  =  (1+a)-1  (l-e2)'3/2(p2-e2)1'2[(3+a)p2-(l+a)e2-2] 

C 2  =  (1-e2  )_3/2  Cp2-e2 ) 3/2  , 

C°  =  (l+a)-1(2+a)-1(l-e2)-2{( 3+a )  (4+a ) p4-2 ( 3+a )[2+(2+a)e2]p2 

+  [  ( 1+a)  (  2+a )  e "*+4  (  2+a)  e 2+2  ] } 

C2  =  (l+a)“1(l-e2)-2(p2-£2)[(4+a)p2-(l+a)e2-3] 

Cl  =  (1-e  2)"2(p2-e2)2 

C |  =  (1+a )-1 ( 2+a)“ 1 ( 1-e2 )~ ^ (pz-£2 ) (4+a) ( 5+a) p4 
-  2 (4+a ) [ 3+( 2+a ) e 2 ]p2 
+  [ 6+6 (2+a ) e 2  +  (1+a ) ( 2+a ) e 2  ]  } 

C2  =  (1+a)-1  (1-e2  )  72  (p2-e2 )  ^2  [  ( 5+a )  p2 -4- (1+a)  e 2  ] 

C*  =  ( 1  -e 2 ) ~ 5/2  (  p 2 -e 2 ) 5//z 

{ 

* 

i 
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TABLE  1.  (Cont.) 


C°  =  (l+a)-1(2+a)  1  C  3+ot )  1  ( 1-e  2 )  ~ 3  { ( 4+a  )  ( 5+a  )  ( 6+a )  p 6 
6 

-  3(9+a)(5+a)[3+(3+a)e2]p4 

+  3(^+a)[6+6(3+a)e2+(2+a)(3+a)e4]p2 
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C4  =  (1+a)- 1 (1-e 2 )- 3 ( p2-e 2 ) 2 { (6+a) p6 -[ 5  +  ( 1+a )e 2  ]  } 

C ®  =  ( 1-e 2 ) ” 3 ( p2-e  2 ) 3 
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SECTION  3 


APPLICATION  Of  FILTERED  SINGULAR  VALUE  DECOMPOSITION 
TO  WAVEFRONT  DECONVOLUTION 


The  purpose  of  this  note  is  to  outline  a  solution  of  the 
wavefront  deconvolution  problem  using  the  method  of  filtered 
singular  value  decomposition  taking  direct  account  of  the  fact 
that  noisy  measurements  are  involved. 


The  basic  equation  is 


4>.  (r)  =  £  B.  (r) z  (r) 

1  ,  in  n 

n=l 


where 


q±(r) 


=  ith  measured  wavefront 

=  phase  aberration  function  for  nth  optical 
element 


Bln,r’ 


influence  function  connecting  ith  wavefront 
to  nth  optical  element 


We  are  given  4^  and  all  the  other  data  (via  noisy  measurements) 
and  are  required  to  determine  z^. 

It  is  convenient  to  rewrite  Eq .  1  in  matrix  form,  so  that  in 
an  obvious  notation  we  have 


4)  =  ?■  z 


•«*  ******  • 


where 


R 

is 

size  M 

x  N 

(M 

rows , 

II 

columns ) 

z 

is 

size  N 

x  1 

(N 

rows , 

1 

column ) 

4> 

is 

size  M 

X  1 

(M 

rows , 

1 

column)  . 

For  our 

s it  uation 

M  > 

N, 

with 

the  most  likely  case  beinr  M  =  N 

(i.e.,  number  of  observations  =  number  of  unknowns). 

The  fundamental  difficulty  with  ill-posed  problems  is  the 

lack  of  sufficient  information  from  response  measurements  to 

infer  the  correct  solution.  This  is  reflected  (mathematically)  in 

the  fact  that  the  system  matrix  P  tends  to  be  underdetermined  (rank 

deficient)  even  if  it  is  formally  overdetermined  (more  rows  than 

columns).  Our  approach  then,  is  to  aurment  the  data  provided  by 

the  instrument  with  any  additional  knowledge  of  the  nature  of  the 

quantity  being  measured  in  order  to  make  the  computed  solution  at 

least  physically  meaningful  and  possibly  even  correct.  Mathe- 

/\ 

matically  this  amounts  to  building  up  the  rank  of  the  matrix  B, 
or  reducing  the  solution  space  so  as  to  yield  a  unique  solution 
which  satisifes  all  constraints  known  to  hold  a  priori.  It  is 
tempting  to  utilize  naive  least  squares  to  ’’solve"  this  problem, 
i.e., 

B+Bz  =  B+$  (3) 

where  B+  is  the  transpose  of  B.  Now  (B+B)  is  symmetric  and  we 


can  formally  invert  to  obtain  z  in  the  form 


z  =  (B  B) 


- 1 


B  4> 


(H) 


If: 

A  A 

1.  System  matrix  B  and  data  matrix  are  exact  (i.e., 

A  A 

no  uncertainty  in  B  and  4> )  , 

2.  B+3  is  of  full  rank, 

3.  Precision  of  the  arithmetic  of  the  computer  is  such 
that  B+B  can  be  formed  and  stored  exactly 

then  the  solution  z  can  be  obtained  from  ( B+B) ~ 1  B+<£ .  Unfortunate! 
these  three  conditions  are  not  to  be  encountered  in  the  deconvolu¬ 
tion  problem,  in  that  conditions  1  and  2  are  not  satisfied. 


The  difficulty  is  that  we  do  not  knovi  the  rank  of  B  and 
until  we  determine  it,  we  cannot  invert  the  matrix  equation.  The 
only  way  that  we  can  determine  the  rank  of  B  is  to  use  the  method 
of  singular  value  decomposition  and  thus  determine  the  rank  of  B 
during  computation.  This  is  not  to  say  that  there  are  not  other 
methods  to  accomplish  this  but  they  are  generally  not  to  be 
trusted  (e.g.,  ride-regression). 


We  must  realize  that  the  deconvolution  problem  is  extremely 

A  /\ 

complicated  because  both  B  and  <J>  are  measured.  The  vasit  major¬ 
ity  of  inversion  problems  encountered  in  the  applied  sciences 

A 

have  the  simplifying  feature  that  the  system  matrix  P  is  known 
exactly. 
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To  solve  our  problem  by  singular  value  decomposition  we  note 


that  the  rectangular  matrix  B  can  be  written  as  the  product  of 
three  matrices 

/\  A  A  A  X 

B  =  UAV  (5) 


where 


U  =  M  x  M  orthogonal  matrix  (00+  =  ’J+,j  =  I  ) 

V  =  N  *  N  orthogonal  matrix  (VV+  =  V+V  =  1^) 

A  =  M  x  m  matrix  with  nonnemative  elements  on  the  main 
diagonal  and  zeros  elsewhere. 


A  has  the  form 


A 


0 

0 

0 


0 

a 

0 

0 


0 

0 


0 


0 

0 


A 

D 

A 

0 


(6) 


The  o's  are  termed  the  singular  values  of  B  and  are  the  solutions 
of  the  eigenvalue  problems 


A  A  +  A 
BB  u  . 
,1 


o  . 

J 


u . 
,1 


5 


J  =  1 


?  + 

o 


O  .  V  . 

.1  .1 


(7) 


where  u .  and  v.  are  the  ,i  th  column  vectors  of  U  and  V.  This  is 
the  mathematical  definition  of  the  o's  but  they  are  never 


evaluated  from  the  definition  unless  B  is  mathematically  exact 
(certainly  not  the  case  with  which  we  have  to  contend'.). 


The  o's  can  be  ordered  so  that 


°i  *  °2  *  °3  *  *  *  *  *  °N  *  0  • 


If  the  rank  of  B  is  k  where  k  <  N,  then 


CTk+l  °k+2 


=  a*  ,=  0 


(8) 


(9) 


The  solution  to  the  minimal  least  squares  problem  posed  by 
Eq.  2  can  be  found  in  the  following  manner.  Multiply  both  sides 

A  4- 

of  Eq.  2  by  p,  and  formally  invert  to  met 

z  =  ( B+B) " 1  B%  (10) 


Substitute  Eq.  5  into  the  right  hand  side 


A  4  A  .  i  A  X  A  A  /\  4  A  4-  A  A  A  4  ,  A  A  A  4"  A 

(B  B)_1B  <t>  =  (VA  U  UAV  )“ 1 VAU  $ 

-A  -A  /V  -  /\  /\  AS  ,  A  AAl  As  4.  ✓V  4_  /A 

=  ( VA~  1  U~  1  UAv“ 1  V“ 1 VAU  )V  AU  <J> 

A»  As  ,  A  A  A  A  4  A 

=  VA~ 1 U  <J>  5  B  <J> 

a4  _  AA_,A4  ^ 

The  matrix  B  =  VA  U  is  termed  the  v&updoinverse  of  B. 


(11) 


The 


matrix  A  1  is  the  !J  *  M  matrix 


2  * 


0 

0 


(1?) 


where 


at  9  —  if  o .  >  0 

J  o .  J 


=  0 


if  a.  =  0 

J 


(13) 


The  solution  becomes  clearer  if  the  riRht  hand  side  of  Eq .  9 
is  written  out  explicitly 


k 

S  -  l 


j=l 


+2 

u,<J> 

_ J  _ 


J  J 


\i  ’ 


k  s  N 


(1*0 


A.  /s 

where  u.  and  v.  denote  column  vectors  of  the  matrices  U  and  V. 
j 

The  smaller  singular  value  entering;  into  the  denominator  of 
the  terms  of  the  expansion  tend  to  mreatly  magnify  any  error  in 
the  data  vector  $ ,  resultin'*  in  a  spurious  solution.  To  alleviate 
this,  the  expansion  must  be  cut  off  (in  some  rational  fashion) 
before  the  contamination  due  to  the  small  singular  values  enters. 


One  way  to  achieve  this  is  to  set 

+  1 

a .  -  —  ,  a .  >  c 

.)  o ,  .1 


o .  <  e 

,1 


0 


(15) 


where  the  criterion  for  picking  e  is 

—  >>  noise  .  (16) 

0  o 

This  approach  has  been  employed  by  the  author  for  several  inver¬ 
sion  problems.  However,  it  is  not  recommended  for  the  deconvolu¬ 
tion  problem  because  it  is  virtually  impossible  to  now  how  much 
noise  there  is  in  the  system. 

Instead  we  go  back  to  Eq.  12  and  introduce  a  filtered  solu¬ 
tion 


z  =  l  [f(a  )utq]v 

J 


(17) 


where  f(cm)  is  a  filter  function  depending  on  the  singular  values. 
The  filter  function  is  required  to  act  like  l/ai  for  large  ck  , 
approach  zero  for  very  small  cm  ,  and  finally  to  decease  from 
1/a ^  to  zero  smoothly  in  the  intermediate  range,  A  useful  candi¬ 
date  is 


f  ( a  1 ) 


N+l 

ai 


N 


+  k 


N+l 


(18) 


where  N  =  0,1,2,** •  and  k  is  a  positive  constant.  Figures  7  and 
8  show  the  behavior  of  f(ai)  for  N  =  1,3  as  a  function  of  various 
values  of  k.  Previous  calculations  made  by  the  author  on  other 
!  u*.  simpler!  )  inversion  problems  have  indicated  that  N  =  1  is  a 
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desirable  compromise  between  too  much  smoothing  and  too  great  a 
sensitivity  on  k.  Knowing  the  singular  values  o ^  and  requiring 
that  k  be  less  than  the  maximum  singular  value  is  one  way  of 
determining  "optimum  k." 

The  question  of  uniqueness  is  a  serious  problem  when  the 
data  is  noisy.  One  of  the  useful  features  of  the  present  approach 
is  that  it  is  possible  to  get  a  quantative  measure  of  the  unioue- 
ness  of  the  solution  in  the  presence  of  noisy  data. 

Let  us  put  the  subscript  s  on  the  solution  given  in  Eq .  11 
to  denote  that  the  solution  is  in  terms  of  noisy  data;  hence 

z  =  ( VA- 1 U+ )  4>  =  1U  •  (19) 

S  o  s 

To  obtain  "nice"  z  ,  we  have  had  to  discard  singular  values  in 
H  .  The  cost  we  have  had  to  pay  is  a  degradation  in  "resolution" 

S 

of  the  sought  for  parameters.  We  quantify  these  arguments  in 

A 

the  following  manner.  Multiply  Eq .  19  by  Hg 

H  <j>  =  H  Bz  .  (20) 

s  s 

A 

The  left  hand  side  is  z  .  We  can  also  manipulate  the  right  hand 
side 
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Now  let 


r/ 

“s 


H  Bz 
s 


A  A  A  I  A  A  A  X  A 

=  (VA_1U  )(UAV  )z 


=  VA~ 1 AV ' z 


A  A  A.  A 

=  W  z 


a 

R 


A  A  X 

w 


(21) 


(22) 


so  that 

A  A  A 

zs  =  Rz  .  (23) 

A.  A 

Thus,  the  degree  of  which  R  approximates  the  unit  matrix  I  is  a 
measure  of  the  uniqueness  of  the  solution. 


There  is  also  an  interpretation  of  the  U  matrix,  although  it 
is  not  as  important  as  the  V  matrix  interpretation.  Consider  the 
basic  equation  again 

A  AA  AAA+A 

4>  =  Bz  =  UAV  z  (2M 


and  set 


A  A  -X  A 

Z  =  V  z 


(25) 


then 

q  =  uaz 

If  we  let 


(26) 
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(27) 


then 


A  A  J*  A 

4>  =  U  4> 


$  =  AZ 


(28) 


Now  when  noise  is  present,  let  us  multiply  both  sides  by  U 

(29) 


A  A  A  A  -i.  / 

U$  =  UU  c 


Jet 


A  A  A  1 

s  =  uu 


Since  our  data  is  noisy  S  is  not  the  unit  matrix  so  that 

/V  A  A  A 

U$  =  S<J> 

A 

The  left  hand  side  is  really  4>  ,  the  noisy  model  data,  hence 

O 

A  A  A 


(30) 


„  =  S  <f> 


(3D 


This  means  that  our  model  data  <J>S  will  deviate  more  and  more  from 

A  A 

<f>  the  more  the  S  matrix  deviates  from  the  unit  matrix. 


Thus,  both  U  and  V  matrices  from  the  singular  value  decompo¬ 
sition  of  B  are  useful  in  determining  the  robestness  of  the 
inversion  calculations. 

A 

The  elements  of  the  system  matrix  B  as  we  have  already 
point  out  are  experimentally  determined  quantities.  For  the  ill 

A 

conditioned  B,  we  must  accept  the  fact  that  the  rank  of  the  matrix 


L 
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is  poorly  determined  numerically  and  may  well  change  as  the 
matrix  elements  vary  by  very  small  amounts.  An  added  advantage 
of  singular  value  decomposition  is  that  the  singular  values  are 
stable  to  perturbations  in  the  matrix  elements  in  that  perturba¬ 
tions  of  the  matrix  elements  produce  perturbations  in  the  singular 
values  of  the  same  order  of  magnitude.  This  is  certainly  not  the 
case  with  the  corresponding  eignvalues,  should  they  exist! 

Thus  far  we  have  discussed  the  strategy  of  the  method  of 
singular  value  decomposition.  The  taotias  (i.e.,  the  actual 
programs,  etc.)  are  fortunately  available.  Golub  and  Reinech 
have  developed  an  ingenious  method  of  computing  the  singular 
values  of  an  M  x  N  matrix  which  is  numerically  very  stable.  The 
algorithm  itself  is  too  complicated  to  describe  as  it  employes 
methods  generally  known  only  to  specialists  in  numerical  linear 
algebra. 
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MISSION 
of 

Rome  Air  Development  Center 

RA VC  plans  and  execute*  res  earch,  development,  test  and 
detected  acquisition  programs  in  support  of  Command,  Control 
Communications  and  Intelligence  (C3 1)  activities .  Technical 
and  engineering  support  within  aneM  of  technical  competence 
is  provided  to  ESP  Poogtiajm  Unices  IPOs)  and  other  ESP 
elements .  The  principal  technical  mission  oJieas  one 
communications ,  electromagnetic  guidance  and  control,  sur¬ 
veillance  o f  ground  and  aerospace  objects,  intelligence  data 
collection  and  handling,  information  system  technology, 
ionospheric  propagation,  solid  state  sciences,  microwave 
physics  and  electronic  reliability,  maintainability  and 
compatibility. 


